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Abstract. In 2003, Leonid A. Levin presented the idea of a combinatorial complete 
one-way function and a sketch of the proof that Tiling represents such a function. In 
this paper, we present two new one-way functions based on semi-Thue string rewriting 
systems and a version of the Post Correspondence Problem and prove their completeness. 
Besides, we present an alternative proof of Levin's result. We also discuss the properties 
a combinatorial problem should have in order to hold a complete one-way function. 



1. Introduction 

In computer science, complete objects play an extremely important role. If a certain 
class of problems has a complete representative, one can shift the analysis from the whole 
class (where usually nothing can really be proven) to this certain, well-specified complete 
problem. Examples include Satisfiability and Graph Coloring for NP (see |GJ79] for a 
survey) or, which is more closely related to our present work, Post Correspondence and 
Matrix Transformation problems for DistNP |Gur9H lBG95j. 

However, there are problems that are undoubtedly complete for their complexity classes 
but do not actually cause such a nice concept shift because they are too hard to analyze. 
Such problems usually come from diagonalization procedures and require enumeration of 
all Turing machines or all problems of a certain complexity class. 

Our results lie in the field of cryptography. For a long time, little has been known 
about complete problems in cryptography. While "conventional" complexity classes got 
their complete representatives relatively soon, it had taken thirty years since the definition 
of a public-key cryptosyst em [DH T6 1 to present a complete problem for the class of all 
public-key cryptosystems |HKN + 05| IGHP 06], However, this complete problem is of the 
"bad" kind of complete problems, requires enumerating all Turing machines and can hardly 
be put to any use, be it practical implementation or theoretical complexity analysis. 
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Before tackling public-key cryptosystems, it is natural to ask about a seemingly simpler 
object: one-way functions (public- key cryptography is equivalent to the existence of a trap- 
door function, a particular case of a one-way function). The first big step towards useful 
complete one-way functions was taken by Leonid A. Levin who provided a construction of 
the first known complete one-way function |Lev87| (see also |Gol99j). 

The construction uses a universal Turing machine U to compute the following function: 

f uni (desc(M),x) = (desc(M),M(x)), 

where desc(M) is the description of a Turing machine M. If there are one-way functions 
among M's (and it is easy to show that if there are any, there are one-way functions that 
run in, say, quadratic time), then f un i is a (weak) one-way function. 

As the reader has probably already noticed, this complete one-way function is of the 
"useless" kind we've been talking about. Naturally, Levin asked whether it is possible to 
find "combinatorial" complete one-way functions, functions that would not depend on enu- 
merating Turing machines or giving their descriptions as input. For 15 years, the problem 
remained open and then was resolved by Levin himself [LevQ3| . Levin devised a clever trick 
of having determinism in one direction and indeterminism in the other. 

Having showed that a modified Tiling problem is in fact a complete one-way function, 
Levin asked to find other combinatorial complete one-way functions. In this work, we 
answer this open question. We take Levin's considerations further to show how a complete 
one-way function may be derived from string-rewriting problems shown to be average-case 
complete in [Wan95j and a variation of the Post Correspondence Problem. Moreover, we 
discuss the general properties a combinatorial problem should enjoy in order to contain a 
complete one-way function by similar arguments. 

2. Distributional Accessibility problem for semi-Thue systems 

Consider a finite alphabet A. An ordered pair of strings (g, h) over A is called a 
rewriting rule (sometimes also called a production). We write these pairs as g — > h because 
we interpret them as rewriting rules for other strings. Namely, for two strings u, v we write 
u => g -»/i v if u = agb, v = ahb for some a,b £ A*. A set of rewriting rules is called a 
semi-Thue system. For a semi-Thue system R, we write u v if u => fl — >/i v for some 
rewriting rule (g, h) G R. Slightly abusing notation, we extend it and write u =^/j v if there 
exists a finite sequence of rewriting rules (g\, hi), . . . , (g m , h m ) G R such that 

U = U ^g^h! Ul ^g 2 ->h 2 u 2 => • • • ^g m ->h m U m = V. 

For a more detailed discussion of semi-Thue systems we refer the reader to [B093J. 
We can now define the distributional accessibility problem for semi-Thue systems: 
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Instance. A semi-Thue system R = {{gi, hi), . . . , (g m , h m }}, two binary 
strings u and v, a positive integer n. The size of the instance is n+ \u\ + 

M + £r(M + N)- 

Question. Is u v? 

Distribution. Randomly and independently choose positive integers n 
and m and binary strings u and v. Then randomly and independently 
choose binary strings g\ , h\ , . . . , g m , h m . Integers and strings are chosen 
with the default uniform probability distribution, namely the distribu- 
tion proportional to ^ for integers and proportional to for binary 
strings. 

In [WB95| . this problem was shown to be complete for DistNP. 

For what follows, we also need another notion of derivation in semi-Thue systems. 
Namely, for a semi-Thue system R we write u =^>* R v if u = agb, v = ahb for some (g, h) G R 
and, moreover, there does not exist another rewriting rule {g', h!) € R such that u = a'g'b' 
and v = a'h'b' for some a',b' € A*. Similarly to =>r, we extend to finite chains of 
derivations. In other words, u =^>* R v if u =^>r v, and on each step of this derivation there 
was only one applicable rewriting rule. This uniqueness (or, better to say, determinism) is 
crucial to perform Levin's trick. We also write u =>^ n v if u =^* R v in at most n steps. 

3. Post Correspondence Problem 

The following problem was proven to be complete for DistNP in |Gur91| (see also 
Remark 2 in |BG95j ): 

Instance. A positive integer m, pairs T = {{u±, v±), . . . , {u m , v m )}, a 
binary string x, a positive integer n. The size of the instance is n + \x\ + 

IX(N + kl)- 

Question. Is • ■ ■ Ui k = uv^ ■ ■ ■ Vi k for some k <nl 
Distribution. Randomly and independently choose positive integers n 
and m and binary string x. Then randomly and independently choose 
binary strings u\, v±, . . . , u m , v m . Integers and strings are chosen with 
the default uniform probability distribution. 

We need a modification of this problem. Namely, we pose the question as follows: does 

u h ■ --u^y = xv h ■■■Vit. 

hold for some y? If we remove the restriction n, this problem is undecidable, but the 
bounded version is not known to be complete for DistNP. 

Given a nonempty list T = ({ui,vi}, . . . , (u m ,v m )) of pairs of strings, it will be conve- 
nient to view the function based on modified Post Correspondence Problem as a derivation 
with pairs from T as inference rules. A string x yields a string y in one step if there is a pair 
(u, v) in r such that uy = xv. The "yield" relation hp is defined as the transitive closure 
of the "yield-in-one-step" relation. 

To perform Levin's trick, we need to get rid of the indeterminism. This time, the 
description of a deterministic version of h* is more complicated than in the case of semi- 
Thue systems. If we simply required it to be deterministic, we would not be able to move 
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the head of the Turing machine to the left. To solve this problem, we have to look ahead by 
one step: if one of the two branches fails in two steps, we consider the choice deterministic. 

Formally speaking, we write x h* y if there are no more than two pairs (p, s), {p' , s') € T 
such that py = xs and p'y' = xs' for some strings y, y' (where y ^ y', but p may equal p'\ 
two possible different applications of the same rule are still nondeterministic) and, moreover, 
we cannot apply any rule in V to y' . We write u h r ' v if u hp v in not more than n steps. 

4. Complete One- Way Tiling Function 

Before presenting our own construction, we recall Levin's complete one-way function 
from [Lev03j. In fact, we slightly modify Levin's construction and present an alternative 
proof based on ideas from |Wan99j . The difference with the original Levin's construction is 
that he considered the tiling function for tiles with marked corners, namely, the corners of 
tiles, instead of edges, are marked with symbols. In the tiling of an n x n square, symbols 
on touching corners of adjacent tiles should match. 

A tile is a square with a symbol for a finite alphabet A on each size which may not be 
turned over or rotated. We assume that there exist infinite copies of each tile. By a tiling 
of an n x n square we mean a set of n 2 tiles covering the square in which the symbols on 
the common sides of adjacent tiles are the same. 

It will be convenient for us to consider Tiling as a string transformation system. Fix a 
finite set of tiles T. We say that T transforms a string x to y, \x\ = \y\, if there is a tiling 
of an |x| x |x| square with x on the bottom and y on top. We write x — >t y in this case. 
By a tiling process we mean the completion of a partially tiled square by one tile at the 
time. Similarly to semi-Thue systems, we define x — >^ y if and only if x — >t V with an 
additional restriction: we permit the extension of a partially tiled square only if the possible 
extension is unique. 

Definition 4.1. The Tiling simulating function (Tiling) is the function / : A* — > A* 
defined as follows: 

• if the input has the form (T, x) for a finite set of tiles T and a string x, then: 

- if x — y, then f(T, x) = (T, y); 

— otherwise, / returns its input; 

• otherwise, / returns its input. 

Theorem 4.2. If one-way functions exist, then Tiling is a weakly one-way function. 

Proof. Let Q be the set of states of a Turing machine M, s be the initial state of M, h - 
the halting state, ttm — the transition function of M, {0, 1,-B} — the tape symbols. By $ 
we denote the begin marker and by # — the end marker. We also introduce a new symbol 
for each pair from Q x {0, 1, B}. We now present the construction of a tileset Tm- 

(1) For each tape symbol a € {0, 1, B} we add 

a (h,a) 



a 
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(2) For each a, b, c € {0, 1, B}, q G Q\ {/i}, p 6 Q, if 7tm(9, a) = (p, 6, i?) we add 



(p> c ) 



(?,a) c 
(3) For each a, 6, c € {0, 1, -B}, q £ Q \ {h}, p e Q, if ttm(q, a) = (p> & ; •£) we add 



P 



(<7,a) 

(4) Finally, for $ and # we add 



# 



# 



# 



The following lemma is now obvious. 

Lemma 4.3. For a deterministic Turing machine M that works n 2 steps and its corre- 
sponding tiling system Tm, 

M(x) = y, \x\ = \y\, if and only if %sxB< n -^ # — >* Tm %hyB< n ~^#. 

The rest of the proof closely follows [Gol99j. Suppose that g is a length-preserving one- 
way function that, on inputs of length n, works for time not exceeding n 2 . By Lemma 14.3} 
there exists a finite system of tiles Tm such that $sxB n ( n ~^# — $hyB n ( n ~ 1 ' l # is 
equivalent to g(x) = y. Therefore, with constant probability solving Tiling is equivalent to 
inverting g. m 



5. A complete one-way function based on semi-Thue systems 

Our complete one-way function is based upon the distributional accessibility problem 
for semi-Thue systems. First, we need to make this decision problem a function and then 
add Levin's trick in order to assure length-preservation. 

Definition 5.1. The semi-Thue accessibility function (STAF) is the function / : A* — > A* 
that defined as follows: 

• if the input has the form ({gi, hi), . . . , (g m , h m ),x), consider the semi-Thue system 
T = ((g 1 , hi),..., {g m , h m )) and: 

— if x y, t = \x\ 2 + A\x\ + 2, there are no rewriting rules in T that may be 
applied to y, and \y\ = \x\, f(T,x) = (T, y); 

— otherwise, / returns its input; 

• otherwise, / returns its input. 

Obviously, STAF is easy to compute: one simply needs to use the first part of the input 
as a semi-Thue system (if that's impossible, return input) and apply its rules until either 
there are two rules that apply, or we have worked for \x\ 2 + 4\x\ + 2 steps, or y has been 
reached and no other rules can be applied. In the first two cases, return input. In the third 
case, check that \y\ = \x\ and return (T,y) if so and input otherwise. 
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Theorem 5.2. If one-way functions exist, then STAF is a weakly one-way function. 

Proof. This time we need to encode Turing machines into the string-rewriting setting. Fol- 
lowing |Gur91l TWB951 IWan99| . we have the following proposition: 

Proposition 5.3. For any finite alphabet A with \A\ > 2 and any pair of binary strings 
x and y there exists a dynamic binary coding scheme of A with {0, 1} with the following 
properties. 

(1) All codes (binary codes of symbols of A) have the same length I = 21og |x| + 0(1). 

(2) Both strings x and y are distinguishable from every code, that is, no code is a 
substring of x ory. 

(3) // a nonempty suffix z of a code u is a prefix of a code v then z = u = v ( one can 
always distinguish where a code ends and another code begins). 

(4) Strings x and y can be written as a unique concatenation of binary strings 1, 10, 
000, and 100 which are not prefixes of any code. 

Now let us define the semi-Thue system Rm that corresponds to a Turing machine 
M. The rewriting rules are divided into three parts: Rm = Ri U R2 U -R3. Let us denote 
B = {1, 10, 100, 000} and fix a dynamic binary coding scheme and denote by w the encoding 
of w in this scheme. 

i?i consists of the following rules for each u G B: 

su — ► $USl , 
S\U — ► USl , 
US±$ — ► S2U$ , 
US2 — ► S2U , 
$£2 — ► $£• 

These rules are needed to rewrite the initial string sx$ into $sx$. Since x can be uniquely 
written as u\ . . . u m for some Ui € B, this transformation can be carried out in 2m + 1 < 
2\x\ + 1 steps. 

R% consists of rewriting rules corresponding to Turing machine instructions. By h we 
denote the halting state, by s — the initial state, by B — the blank symbol, by Qm — the 
set of states of M, by ttm — the transition function of M, and by $ the begin/end marker. 
Then R2 consists of the following pairs: 

(1) For each state q G Qm \ {h}, p € Q, a,b,c G {0, 1, B}: 

7TAf(?> a ) = (p? b, R) qac — > bpc, qa$ — > bpB$ G R2. 

(2) For each state q G Qm \ {h}, p G Q, a,b,d G {0, 1, B} and c G {0, 1, $}, 

ttm(q, a) = (p, b, L) dqac — > pdbc, dqB$ —* pdbB% G R2 

for a ^ B, c + $, or b + B. 

R\ and R2 are completely similar to the construction presented in [Wan99j. The third 
part of his construction is supposed to reduce the result from %sy%, where y is the result of 
the Turing machine computation, to the protocol of the Turing machine that is needed to 
prove that non-deterministic semi-Thue systems are DistNP-hard. 

This time we have to deviate from [Wan99]: we need a different set of rules because 
we actually need the output of the machine, and not the protocol. Thus, our version of R3 
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looks like the following: 

$flU — > $U£5, 

S§U$ —* US6$> 
MSg — > S6«, 

$^6 - ► h. 

This transformation can be carried out in at most 2\y\ + 1 steps. 

These rules simply translate y back into the original y and add h in front of the output, 
thus achieving the actual output configuration of the original Turing machine M. 

The following lemma is now obvious. 

Lemma 5.4. For a deterministic Turing machine M and its corresponding semi-Thue 
system Rm, 

M(x) = y if and only if sx$ =^* f hy§, 
where t = T + 2\x\ + 2\y \ + 2, T being the running time of M on x. 

Again, the rest of the proof follows the lines of [Gol99j . There is a constant probability 
(for the uniform distribution, it is proportional to ^ r ^ 2 \r\ ) that any given semi-Thue system 
appears as the first part of the input. Suppose that g is a length-preserving one-way function. 
By [Gol99], we can safely assume that there is a Turing machine M g that computes g and 
runs in quadratic time. By Lemma 15.41 there exists a semi-Thue system Rm such that 
sx% ^*r m hy% is equivalent to g{x) = y. Therefore, with constant probability solving STAF 
is equivalent to inverting g. m 



6. A complete one-way function based on Post Correspondence 

In this section, we describe a one-way function based on the Post Correspondence 
Problem and prove that it is complete. The function is defined as follows. 

Definition 6.1. The Post Transformation function (PTF) is the function / : A* — > A* 
defined as follows: 

• if the input has the form ((gi, hi), . . . , (g m , h m ),x), considers the derivation system 
T = ({gi, hi),..., (g m , h m )) and: 

4 

— if x hp' n y, there are no rewriting rules in T that may be applied to y, and 
\y\ = \x\, then f{T,x) = (T,y); 

— otherwise, / returns its input; 

• otherwise, / returns its input. 

Now, we reduce the computation of a universal Turing machine to Post Correspondence 
in the way described in [G ur91j . 

Theorem 6.2. If one-way functions exist, then PTF is a weakly one-way function. 

Proof. As usual, let Q be the set of states of a Turing machine M, s be the initial state of 
M, h — the halting state, ttm — the transition function of M, 0, 1, B — the tape symbols. 
For all symbols we use the dynamic binary coding scheme described in Section [5j 
We now present the construction of a derivation set Tm- 
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(1) For every tape symbol x: 

(x, x). 

(2) For each state q G Qm\ {h}, p G Q, a,b G {0, 1} and rule ttm(q, a) = (p, b, R): 

(qa,bp). 

(3) For each state q G Qm \ {h}, p G Q, a G {0, 1} and rule 7tm(<7, -B) = (f>, a, -R): 

(qB,bpB). 

(4) For each state q £ Qm\ {h}, p e Q, a,b,c £ {0, 1} and rule ttm (q, a) = (p, b, L): 

(cqa,pcb). 

(5) For each state q G Qm\ {h}, p G Q, a G {0, 1} and rule ttm(q, B) = (p, a, L): 

{ cgB , pcbB ). 

The configuration of M after t steps of computation is represented by a string xqy, where 
q is the current state of M, x is the tape before the head, and y is the tape from the 
head to the first blank symbol. The simulation of a step of M from a configuration xqy 
consists of at most |x| applications of the ruleHJ followed by one application of one of the 
rules EHSl followed by \y\ — 1 applications of rule [TJ Note that before an application of a 
rule that moves head to the left one could also apply rule [TJ If the Turing Machine M is 
deterministic, then this "wrong" application leads to a situation where no rule from Tm is 
applicable. Thus, we have the following lemma. 

Lemma 6.3. For a deterministic Turing machine M with running time at most n 2 and its 
corresponding Post Transformation system Tm, 

M(x) = y if and only if sxB hp' hyB. 

As usual, the rest of the proof closely follows |Gol99j . Suppose that g is a length- 
preserving one-way function that works for time not exceeding n 2 . By Lemma 16.31 there 

4 

exists a finite system of pair Tm such that sxB h^" hyB is equivalent to g(x) = y. 
Therefore, with constant probability solving PTF is equivalent to inverting g. m 

Remark 6.4. Note the slight change in distributions on inputs and outputs: PTF accepts 
as input x and outputs y, while the emulated machine g accepts x and outputs y. Such 
"tiny details" often hold the devil of average-case reasoning. Fortunately, distributions on 
x and x can be transformed from one to another by a polynomial algorithm, so PTF is still 
a weak one-way function (see [Gol99j for details). 

7. Complete one-way functions and DistNP-hard combinatorial problems 

Both our constructions of a complete one-way function look very similar to the con- 
struction on the Tiling complete one-way function. This naturally leads to the question: 
in what other combinatorial settings can one apply the same reasoning to find a complete 
one-way function? 

The whole point of this proof is to keep the function both length-preserving and easily 
computable. Obvious functions fall into one of two classes. 
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(1) Easily computable, but not length-preserving. For any DistNP-hard problem, one can 
construct a hard-to-invert function / that transfers protocols of this problem into its 
results. This function is hard to invert on average, but it does not preserve length, 
and thus it is impossible to translate a uniform distribution on outputs of / into a 
reasonable distribution on its inputs. The reader is welcome to think of a reasonably 
uniform distribution on proper tilings that would result in a reasonably uniform 
distribution on their upper rows; we believe that to construct such a distribution is 
either impossible or requires a major new insight. 

(2) Length-preserving, but hard to compute. Take a DistNP-hard problem and consider 
the function that sends its input into its output (e.g. the lowest row of the tiling 
into its uppermost row). This function is hard to invert and length-preserving, but 
it is also hard to compute, because to compute it one needs to solve Tiling. 

Following Levin, we get around these obstacles by having a deterministic version of 
a DistNP-hard problem. This time, a Tiling problem produces nontrivial results only if 
there always is only one proper tile to attach. Similarly, in Section [5] we demanded that 
there is only one rewriting rule that applicable on each step (we introduced =4>* for this 
very purpose). In Section [6] we slightly generalized this idea of determinism, allowing fixed 
length deterministic backtrack. However, if for all z € f~ l {y) we can do this deterministic 
procedure, then we can easily invert /. So we need that for most z an indeterminism appears 
and the procedure return z. 

A combinatorial problem should have two properties in order to hold a complete one- 
way function. 

(1) It should have a deterministic restricted version, like Tiling, string rewriting and 
modified Post Correspondence. 

(2) Its deterministic version should be powerful enough to simulate a deterministic 
Turing machine. For example, natural deterministic Post Correspondence (without 
any backtrack) is, of course, easy to formulate, but does not seem to be powerful 
enough. 

Keeping in mind these properties, one is welcome to look for other combinatorial settings 
with combinatorial complete one-way functions. 

8. Discussion and further work 

We have shown a new complete one-way function and discussed possibilities of other 
combinatorial settings to hold complete one-way functions. These functions are combina- 
torial in nature and represent a step towards the easy-to-analyze complete cryptographic 
objects, much like SAT is a perfect complete problem for NP. 

However, we are still not quite there. Basically, we sample a Turing machine at random 
and hope to find precisely the hard one. This distinction is very important for practical 
implications of our constructions. We believe that constructing a complete cryptographic 
problem that has properties completely analogous to SAT requires a major new insight, and 
such a construction represents one of the most important challenges in modern cryptography. 

Another direction would be to find other similar combinatorial problems that can hold 
a complete one-way function. By looking at our one-way functions and Levin's Tiling, 
one could imagine that every DistNP-complete problem readily yields a complete one-way 
function. However, there is also this subtle requirement that the problem (or its appropriate 
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restriction) should be deterministic (compare =^ and =Kr). It would be interesting to 
restate this requirement as a formal restriction on the problem setting. This would require 
some new definitions and, perhaps, a more general and unified approach to combinatorial 
problems. 
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